Pogany: A Tangible Cephalomorphic Interface for Expressive Facial Animation

نویسنده

  • Christian Jacquemin
چکیده

A head-shaped input device is used to produce expressive facial animations. The physical interface is divided into zones, and each zone controls an expression on a smiley or on a virtual 3D face. Through contacts with the interface users can generate basic or blended expressions. To evaluate the interface and to analyze the behavior of the users, we performed a study made of three experiments in which subjects were asked to reproduce simple or more subtle expressions. The results show that the subjects easily accept the interface and get engaged in a pleasant affective relationship that make them feel as sculpting the virtual face. This work shows that anthropomorphic interfaces can be used successfully for intuitive affective expression. 1 Anthropomorphic Devices for Affective Communication We have designed and built a head-shaped tangible interface for the generation of facial expressions through intuitive contacts or proximity gestures. Our purpose is to offer a new medium of communication that can involve the user in an affective loop [1]. The input to the interface consists of intentional and natural affective gestures, and the output is an embodiment of the emotional content of the input gestures. The output is either a facial expression or a smiley, it is used as a feedback to the user so that she can both tune her interactions with the interface according to the output (cognitive feedback), and feel the emotions expressed by the virtual actor or the smiley (affective feedback). The input device is a hollow resin head with holes and an internal video camera that captures the positions of the fingers on the interface. The output is an interactive smiley or an expressive virtual 3D face (see figure 1). The user can control a wide range of expressions of the virtual avatar through correspondences between finger contacts an a set of basic expressions of emotions. The interface is used as a means to display ones own expressions of emotions as well as a means to convey emotions through the virtual face. We take advantage of the anthropomorphic shape of the input, a stylized human head, to establish easily learnable correspondences between users contacts and expressed emotions. Even though a doll head was used in an “early” design of tangible interfaces in the mid-90s [2], human shapes are more widely used as output interfaces (e.g. Embodied Conversational Agents) than as input devices. Through our study we show that anthropomorphic input interfaces Fig. 1. Experimental setup. are experienced as an engaging and efficient means for affective communication, particularly when they are combined with a symmetric output that mirrors the emotions conveyed by the input interface. 2 Anthropomorphic Input/Output Device We now examine in turn the two components of the interface: the physical tangible input device, and the virtual animated face together with the mapping between gestures and expressions. Three experimental setups have been proposed: two setups in which strongly marked expressions can be generated on an emoticon or a 3D virtual face, and a third and more attention demanding experiment in which subtle and flexible expressions of the 3D face are controlled by the interface. At this point of the development, no social interaction is involved in our study in order to focus first on the interface usability and on the ease of control of the virtual agent’s expressions. 2.1 Input: Cephalomorphic Tangible Interface The physical and input part of the interface is based on the following constraints: 1. it should be able to capture intuitive gestures through hands and fingers as if the user were approaching someone’s face, 2. direct contacts as well as gestures in the vicinity of the head should also be captured in order to allow for a wide range of subtle inputs through distant interactions, 3. as suggested by the design study of SenToy [3], the shape of the physical interface should not have strongly marked traits that would make it look like a familiar face, or that would suggest predefined expressions, 4. the most expressive facial parts of the interface should be easily identified without the visual modality in order to allow for contact interaction: eyes, eyebrows, mouth, and chin should have clearly marked shapes. The first constraint has oriented us towards multi-touch interaction techniques that can detect several simultaneous contacts. Since the second constraint prohibits the use of pressure-sensitive captures that cannot report gestures without contacts, we have chosen a vision-based capture device that is both multitouch and proximity-sensitive. The interface is equipped with a video camera, and 43 holes are used to detect the positions of the fingers in the vicinity of the face (figure 2). In order to detect the positions and gestures of both hands, right and left symmetric holes play the same role in the mapping between interaction and facial animation. The holes are chosen among the 84 MPEG4 key points used for standard facial animation [4]. The points are chosen among the mobile key points of this formalism, for instance points 10.* and 11.* for ear and hair are ignored. The underlying hypothesis for selecting these points is that, since they correspond to places in the face with high mobility, they also make sensible capture points for animation control. Fig. 2. Cross section of the physical interface and list of capture holes. The third constraint has oriented us towards an abstract facial representation that would hardly suggest a known human face. Since we wanted the interface to be however appealing for contact, caress or nearby gestures, its aesthetics was a concern. Its design is deliberately soft and non angular; it is loosely inspired by Mademoiselle Pogany, a series of sculptures of the 20th century artist Constantin Brancusi (figure 3). The eye and mouth reliefs are prominent enough to be detected by contact with the face (fourth constraint). The size of the device (14cm high) is similar to a joystick, and is about three times smaller than a human face. Fig. 3. Overview of the physical interface and bimanual interaction. All the tests have been done with bare hands and normal lighting conditions (during day time with natural light and in the evening with regular office lighting). 2.2 Output: Expressive Smiley or Virtual 3D Face A straightforward way to provide users with a feedback on the use of the interface for affective communication is to associate their interactions with expressions of emotions on an animated face. We have used two type of faces: a smiley and a realistic 3D face with predefined or blended expressions. Of course other types of correspondences can be established and we do not claim that the physical interface should be restricted to control facial animation. Other mappings are under development such as the use of the interface for musical composition. In a first step, we however found it necessary to check that literal associations could work before turning to more elaborated appliances. The association of interactions with facial animations is performed in two steps. First the video image is captured with the ffmpeg library and transformed into a bitmap of gray pixels. After a calibration phase, bitmaps are analyzed at each frame around each hole by computing the difference between the luminosity at calibration time and its current value. The activation of a capture hole is the ratio between its current luminosity and its luminosity at calibration time. The activation of a zone made of several holes is its highest hole activation. In a second step, zone activations are associated with facial expressions. Each expression is a table of keypoint transformations, a Face Animation Table (FAT) in MPEG4. The choice of the output expression depends on the rendering mode. In the non-blended mode, the expression associated with the highest activated zone is chosen. In the blended mode, a weighted interpolation is made between the 1 http://ffmpeg.mplayerhq.hu/ expressions associated with each activated zone. Facial animation is implemented in Virtual Choreographer (VirChor), an OpenSource interactive 3D rendering tool. VirChor stores the predefined FATs, receives expression weights from the video analysis module, and produces the corresponding animations. 2.3 Basic and Blended Facial Expressions The mapping between interactions and expressions relies on a partitioning of the face into six symmetrical zones shown in the center part of the two images in figure 4. Each zone is associated with a single basic expression and the level of activation of a zone is the percentage of occlusion of the most occluded key point in this zone. Thus hole occlusion by fingers is used to control expressions on the virtual faces (smiley or 3D face). All the zones are symmetrical so that rightand left-handed subjects are offered the same possibilities of interactions. Two sets of 6 basic facial expressions were designed for the smiley and for the 3D face that the users could identify and reproduce quickly. For the smiley, the 6 expressions correspond to 5 basic emotions and a non expressive face with closed eyes: angry face, surprised eyebrows, surprised mouth, happy mouth, sad mouth, closed eyes (see upper part of figure 4). Only the angry face expression involves both the upper and the lower part of the face. Each basic expression of the 3D face (lower part of figure 4) is associated with an Action Unit (AU) of Ekman and Friesen’s Facial Action Coding System [5]: a contraction of one or several muscles that can be combined to describe the expressions of emotions on a human face. Only 6 of the 66 AUs in this system are used; they are chosen so that they have simple and clear correspondences with expressions of the smiley. The only noticeable difficulty is the correspondence between the angry face smiley, which involves modifications of the upper, lower, and central part of the face, and the associated 3D expression of AU4 (Brow Lowerer) that only involves the upper part of the face. 3D basic face expressions are deliberately associated with AUs instead of more complex expressions in order to facilitate the recognition of blended expressions in the third task of the experiment. In this task, the users have to guess what are the basic expressions involved in the synthesis of complex expressions resulting from the weighted interpolation of AUs. Through this design, only a small subset of facial expressions can be produced. They are chosen so that they can be easily distinguished. More subtle expressions could be obtained by augmenting the number of zones through a larger resin cast with more holes or through this version of the interface with less holes in each zone. The 3D animation of each basic expression is made by displacing the MPEG4 key points. Since these expressions are restricted to some specific parts of the human face, they only involve a small subset of the 84 MPEG4 key points. For example, the basic expression associated with AU2 (Outer Brow Raiser) is based on the displacement of key points 4.1 to 4.6 (eye brows), while the expression of AU12 relies on key points 2.2 to 2.9 (inner mouth) and 8.1 to 8.8 (outer mouth). 2 http://virchor.sf.net/ Fig. 4. Mapping between face zones and emoticons or virtual face expressions. We now turn to the study of the interface usability in which the users were asked to reproduce basic or blended expressions of emotions on a virtual face through interactions with the physical interface. 3 Usability Study: Control of Emoticons or Facial Expressions through Pogany As for the SenToy design experiment [3], our purpose is to check whether a user can control a virtual character’s expressions (here the face) through a tangible interface that represents the same part of the body. Our usability study is intended to verify that (1) users can quickly recognize facial expressions from a model, and (2) that they can reproduce them at various levels of difficulty. Last, we wanted to let the users express themselves about their feelings during the experiment and their relationship to the device.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cephalomorphic interface for emotion-based music synthesis

This article discusses to adapt ’Pogany’, a tangible cephalomorphic interface designed and realized in LIMSI-CNRS laboratory, to use for music synthesis purposes. We are interested in methods for building an affective emotion-based system for gesture and posture identification, captured through a facial interface that understands variations of luminosity by distance or touch. After a brief disc...

متن کامل

Expressive Speech Animation Synthesis with Phoneme-Level Controls

This paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a predesigned corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, ...

متن کامل

High Degree of Freedom Input and Large Displays for Accessible Animation

High Degree of Freedom Input and Large Displays for Accessible Animation Noah Lockwood Master of Science Graduate Department of Computer Science University of Toronto 2006 We explore methods of making computer animation accessible to nontechnical users by utilizing high degree–of–freedom, gestural input and visualization on a large display. Design principles are developed which guide our approa...

متن کامل

Photo-realistic expressive text to talking head synthesis

A controllable computer animated avatar that could be used as a natural user interface for computers is demonstrated. Driven by text and emotion input, it generates expressive speech with corresponding facial movements. To create the avatar, HMM-based text-to-speech synthesis is combined with active appearance model (AAM)-based facial animation. The novelty is the degree of control achieved ove...

متن کامل

Towards an Affective Gesture Interface for Expressive Music Performance

This paper discusses the use of ‘Pogany’, an affective anthropomorphic interface, for expressive music performance. For this purpose the interface is equipped with a module for gesture analysis: a) in a direct level, in order to conceptualize measures capable of driving continuous musical parameters, b) in an indirect level, in order to capture high-level information arising from ‘meaningful’ g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007